Analysis and Optimization of the Hadoop Speculative Execution Mechanism
نویسندگان
چکیده
The existing Hadoop clusters are mostly composed of heterogeneous nodes, which have different computing and storage capacities, with the speed of maps to reduce tasks performed on the nodes being quite different. However, the finish time of the entire job is determined by the slowest task, so looking for the “drag tasks” strategy has a dominant position in the whole job scheduling process. The current speculative execution mechanism of Hadoop results in shortcomings to find the drag tasks in time. In this paper we try to improve the speculative execution mechanism, and then we apply the improved First-in-First-out (FIFO) scheduler to the Hadoop cluster. Experiments verify that the improved mechanism has better performance when the Hadoop platform has many drag tasks, where it can increase the cluster resource utilization and throughput.
منابع مشابه
Optimization Framework for Map Reduce Clusters on Hadoop’s Configuration
ARTICLE INFO Hadoop represents a Java-based distributed computing framework that is designed to support applications that are implemented via the MapReduce programming model. Hadoop performance however is significantly affected by the settings of the Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-consuming. Existing system uses Random forest approa...
متن کاملSpeculation-aware Resource Allocation for Cluster Schedulers
Resource allocation and straggler mitigation (via “speculative” copies) are two key building blocks for analytics frameworks. Today, the two solutions are largely decoupled from each other, losing the opportunities of joint optimization. Resource allocation across jobs assumes that each job runs a fixed set of tasks, ignoring their need to dynamically run speculative copies for stragglers. Cons...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملEstimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment
Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, p...
متن کاملBudget based dynamic slot allocation for MapReduce clusters
MapReduce is one of the programming models for processing large amount of data in cloud where resource allocation is one of the research areas since it is responsible for improving the performance of Hadoop. However the resource allocation can be further improved by focusing on a set of mechanisms, that includes the budget based HFS algorithm where the fast worker node is identified first based...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016